We already used some graph metrics in the previous tutorial. There we will cover graphs metrics and features in details. Also, we will cover Brain Connectivity Toolbox usage.

1. Realworld dataset

Here we use UCLA autism dataset publicly available at the UCLA Multimodal Connectivity Database. Data includes DTI-based connectivity matrices of 51 high-functioning ASD subjects (6 females) and 43 TD subjects (7 females).



In [1]:

    
from reskit.datasets import load_UCLA_data


X, y = load_UCLA_data()
X = X['matrices']

2. Normalizations and Graph Metrics

We can normalize and build some metrics.



In [2]:

    
from reskit.normalizations import mean_norm
from reskit.features import bag_of_edges
from reskit.core import MatrixTransformer


normalized_X = MatrixTransformer(
    func=mean_norm).fit_transform(X)

featured_X = MatrixTransformer(
    func=bag_of_edges).fit_transform(normalized_X)

3. Brain Connectivity Toolbox

We provide some basic graph metrics in Reskit. To access more state of the art graph metrics you can use Brain Connectivity Toolbox. You should install it via pip:

sudo pip install bctpy

Let's calculate pagerank centrality of a random graph using BCT python library.



In [3]:

    
from bct.algorithms.centrality import pagerank_centrality
import numpy as np


pagerank_centrality(np.random.rand(3,3), d=0.85)









    Out[3]:





array([ 0.46695386,  0.2897405 ,  0.24330564])

Now we calculates this metric for UCLA dataset. d is the pagerank_centrality parameter, called damping factor (see bctpy documentation for more info).



In [4]:

    
featured_X = MatrixTransformer(
    d=0.85,
    func=pagerank_centrality).fit_transform(X)

If we want to try pagerank_centrality and degrees for SVM and LogisticRegression classfiers.



In [5]:

    
from bct.algorithms.degree import degrees_und

from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

from sklearn.model_selection import StratifiedKFold

from reskit.core import Pipeliner

# Feature extraction step variants (1st step)
featurizers = [('pagerank', MatrixTransformer(    
                                d=0.85,
                                func=pagerank_centrality)),
               ('degrees', MatrixTransformer(
                                func=degrees_und))]

# Models (3rd step)
classifiers = [('LR', LogisticRegression()),
               ('SVC', SVC())]

# Reskit needs to define steps in this manner
steps = [('featurizer', featurizers),
         ('classifier', classifiers)]

# Grid search parameters for our models
param_grid = {'LR': {'penalty': ['l1', 'l2']},
              'SVC': {'kernel': ['linear', 'poly', 'rbf', 'sigmoid']}}

# Quality metric that we want to optimize
scoring='roc_auc'

# Setting cross-validations
grid_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)
eval_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1)

pipe = Pipeliner(steps=steps, grid_cv=grid_cv, eval_cv=eval_cv, param_grid=param_grid)
pipe.plan_table









    Out[5]:






  
    
      
      featurizer
      classifier
    
  
  
    
      0
      pagerank
      LR
    
    
      1
      pagerank
      SVC
    
    
      2
      degrees
      LR
    
    
      3
      degrees
      SVC



In [6]:

    
pipe.get_results(X, y, scoring=scoring, caching_steps=['featurizer'])









    



Line: 1/4
Line: 2/4
Line: 3/4
Line: 4/4






    Out[6]:






  
    
      
      featurizer
      classifier
      grid_roc_auc_mean
      grid_roc_auc_std
      grid_roc_auc_best_params
      eval_roc_auc_mean
      eval_roc_auc_std
      eval_roc_auc_scores
    
  
  
    
      0
      pagerank
      LR
      0.584142
      0.0942091
      {'penalty': 'l2'}
      0.639192
      0.0805918
      [ 0.5959596   0.76666667  0.63333333  0.675   ...
    
    
      1
      pagerank
      SVC
      0.605373
      0.144538
      {'kernel': 'linear'}
      0.611919
      0.104865
      [ 0.62626263  0.75555556  0.57777778  0.6625  ...
    
    
      2
      degrees
      LR
      0.634016
      0.0870382
      {'penalty': 'l1'}
      0.568081
      0.0699196
      [ 0.5959596   0.57777778  0.46666667  0.675   ...
    
    
      3
      degrees
      SVC
      0.572663
      0.0409234
      {'kernel': 'poly'}
      0.542753
      0.0751127
      [ 0.62626263  0.5         0.5         0.6375  ...

This is the main things about maching learning on graphs. Now you can try big amount of normalizations features and classifiers for graphs classifcation. In case you need something specific you can implement temporary pipeline step to fiegure out the influence of this step on the result.

	featurizer	classifier	grid_roc_auc_mean	grid_roc_auc_std	grid_roc_auc_best_params	eval_roc_auc_mean	eval_roc_auc_std	eval_roc_auc_scores
0	pagerank	LR	0.584142	0.0942091	{'penalty': 'l2'}	0.639192	0.0805918	[ 0.5959596 0.76666667 0.63333333 0.675 ...
1	pagerank	SVC	0.605373	0.144538	{'kernel': 'linear'}	0.611919	0.104865	[ 0.62626263 0.75555556 0.57777778 0.6625 ...
2	degrees	LR	0.634016	0.0870382	{'penalty': 'l1'}	0.568081	0.0699196	[ 0.5959596 0.57777778 0.46666667 0.675 ...
3	degrees	SVC	0.572663	0.0409234	{'kernel': 'poly'}	0.542753	0.0751127	[ 0.62626263 0.5 0.5 0.6375 ...